Using Decision List for Farsi Word Sense Disambiguation
نویسندگان
چکیده
This paper describes Farsi word sense disambiguation in unrestricted text using decision list. Decision list is a rule based algorithm which searches for discriminatory features in the training data and extracts a set of rules. These rules are used for disambiguation of word senses. Since this method is a supervised corpus based method, it needs a Farsi sense-tagged corpus. In this paper, we used a raw corpus and labeled a subset of it manually. To evaluate the performance of this method, we applied it to 20 Farsi homographs. The comparison of disambiguation results with baselines shows the effectiveness of this method. Moreover, this method was compared to K Nearest Neighbor (KNN) which is an exemplar based method. In this paper, we used 10 fold cross validation test method in evaluations.
منابع مشابه
رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA
Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...
متن کاملStandard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation
In this paper, we address the shortage of evaluation benchmarks on Persian (Farsi) language by creating and making available a new benchmark for English to Persian Cross Lingual Word Sense Disambiguation (CL-WSD). In creating the benchmark, we follow the format of the SemEval 2013 CL-WSD task, such that the introduced tools of the task can also be applied on the benchmark. In fact, the new benc...
متن کاملImproving the Collocation Extraction Method Using an Untagged Corpus for Persian Word Sense Disambiguation
Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagg...
متن کاملDetection of Japanese Homophone Errors by a Decision List Including a Written Word as a Default Evidence
In this paper, we propose a practical method to detect Japanese homophone errors in Japanese texts. It is very important to detect homophone errors in Japanese revision systems because Japanese texts suffer from homophone errors frequently. In order to detect homophone errors, we have only to solve the homophone problem. We can use the decision list to do it because the homophone problem is equ...
متن کاملInfluence of Morphology in Word Sense Disambiguation for Tamil
Many Word Sense Disambiguation (WSD) algorithms do not take into account the morphological variations in the language. However, as Indian languages are highly inflected languages, we investigate whether morphology must be taken into account for WSD for Indian languages, as they are very rich in morphology. This paper analyses the influence of morphology in WSD for Tamil. We believe our results ...
متن کامل